Searching for Ephemeral Subsequences in Strings
نویسندگان
چکیده
Let T = u, ... Un be a text where every symbol Uj has a time slamp t, and a duration d(ai) a.s.sociatcd with it. The time stamps of the ai's are increasing, so that j > i implies tj > li. A text. symbol OJ is alive. at time tiff tj :S t:S t; + d(u;). A subsequence ai, ... OJ ... of T is alive iff every Gi k is alive at time tim.' that is. ti k + d(ai.) ~ tim for all k E {I, ... I m I}. We consider the problem of determining whether a given pattern P == h ... bm occurs as an alive subsequence ofT. We give an off-line (i.e., the pattern is known in advance) algorithm, running in O(n+m) time. We also introduce and discuss data structures for fast on-line implementation. Index Terms Algorithms, pattern matching, ephemeral subsequence, DAWG, forward failure function, intrusion and misuse detection "and Diparlimento di Eletlronica e Informatica, Universita. di Padova, Via Gradenigo 6/A, 35131 Padova, Haly. [email protected]; partially supported by NSF grant CCR-92-01078, by NATO grant CRG 900293, by the National Research Council of Italy, and by the ESPRIT III Basic Research Programme of the EC under contract No. 9072 (Project GEPPCOM). lThis author gratefully acknowledges support from tILe COAST Project at Purdue and its sponsors, in particular Hewlett Packard. DARPA, the National Security Agency, and the Office of Research and Development
منابع مشابه
A New Family of String Classifiers Based on Local Relatedness
This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...
متن کاملA Greedy Approach for Computing Longest Common Subsequences
This paper presents an algorithm for computing Longest Common Subsequences for two sequences. Given two strings X and Y of length m and n, we present a greedy algorithm, which requires O(n log s) preprocessing time, where s is distinct symbols appearing in string Y and O(m) time to determines Longest Common Subsequences.
متن کاملComputing the Number of Longest Common Subsequences
This note provides very simple, efficient algorithms for computing the number of distinct longest common subsequences of two input strings and for computing the number of LCS embeddings.
متن کاملStrings with Maximally Many Distinct Subsequences and Substrings
A natural problem in extremal combinatorics is to maximize the number of distinct subsequences for any length-n string over a finite alphabet Σ; this value grows exponentially, but slower than 2n. We use the probabilistic method to determine the maximizing string, which is a cyclically repeating string. The number of distinct subsequences is exactly enumerated by a generating function, from whi...
متن کاملFinding Frequent Subsequences in a Set of Texts
Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algorithm on the basis of Common Subsequences related problems. Firstly, we make an overview of different exist...
متن کامل